<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>Selecting an access method</title>
    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
    <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
    <link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
    <link rel="up" href="am_conf.html" title="Chapter 2.  Access Method Configuration" />
    <link rel="prev" href="am_conf.html" title="Chapter 2.  Access Method Configuration" />
    <link rel="next" href="am_conf_logrec.html" title="Logical record numbers" />
  </head>
  <body>
    <div xmlns="" class="navheader">
      <div class="libver">
        <p>Library Version 12.1.6.2</p>
      </div>
      <table width="100%" summary="Navigation header">
        <tr>
          <th colspan="3" align="center">Selecting an access method</th>
        </tr>
        <tr>
          <td width="20%" align="left"><a accesskey="p" href="am_conf.html">Prev</a> </td>
          <th width="60%" align="center">Chapter 2.  Access Method Configuration </th>
          <td width="20%" align="right"> <a accesskey="n" href="am_conf_logrec.html">Next</a></td>
        </tr>
      </table>
      <hr />
    </div>
    <div class="sect1" lang="en" xml:lang="en">
      <div class="titlepage">
        <div>
          <div>
            <h2 class="title" style="clear: both"><a id="am_conf_select"></a>Selecting an access method</h2>
          </div>
        </div>
      </div>
      <div class="toc">
        <dl>
          <dt>
            <span class="sect2">
              <a href="am_conf_select.html#idm140526461794912">Btree or Heap?</a>
            </span>
          </dt>
          <dt>
            <span class="sect2">
              <a href="am_conf_select.html#idm140526458043648">Hash or Btree?</a>
            </span>
          </dt>
          <dt>
            <span class="sect2">
              <a href="am_conf_select.html#idm140526459252880">Queue or Recno?</a>
            </span>
          </dt>
        </dl>
      </div>
      <p> 
        The Berkeley DB access method implementation unavoidably
        interacts with each application's data set, locking
        requirements and data access patterns. For this reason, one
        access method may result in dramatically better performance
        for an application than another one. Applications whose data
        could be stored using more than one access method may want to
        benchmark their performance using the different candidates. 
    </p>
      <p>
        One of the strengths of Berkeley DB is that it provides
        multiple access methods with nearly identical interfaces to
        the different access methods. This means that it is simple to
        modify an application to use a different access method.
        Applications can easily benchmark the different Berkeley DB
        access methods against each other for their particular data
        set and access pattern. 
    </p>
      <p>
        Most applications choose between using the Btree or Heap
        access methods, between Btree or Hash, or between Queue and
        Recno, because each of these pairs offer similar
        functionality.
    </p>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="idm140526461794912"></a>Btree or Heap?</h3>
            </div>
          </div>
        </div>
        <p> 
            Most applications use Btree because it performs well
            for most general-purpose database workloads. But there are
            circumstances where Heap is the better choice. This
            section describes the differences between the two access
            methods so that you can better understand when Heap might
            be the superior choice for your application.
        </p>
        <p> 
            Before continuing, it is helpful to have a high level
            understanding of the operating differences between Btree
            and Heap.
        </p>
        <div class="sect3" lang="en" xml:lang="en">
          <div class="titlepage">
            <div>
              <div>
                <h4 class="title"><a id="idm140526469694608"></a>Disk Space Usage</h4>
              </div>
            </div>
          </div>
          <p>
                The Heap access method was developed for use in
                systems with constrained disk space (such as an
                embedded system). Because of the way it reuses page
                space, for some workloads it can be much better than
                Btree on disk space usage because it will not grow the
                on-disk database file as fast as Btree. Of course,
                this assumes that your application is characterized by
                a roughly equal number of record creations and
                deletions.
            </p>
          <p>
                Also, Heap can actively control the space used by
                the database with the use of the <a href="../api_reference/C/dbset_heapsize.html" class="olink">DB-&gt;set_heapsize()</a>
                method. When the limit specified by that method is
                reached, no additional pages will be allocated and
                existing pages will be aggressively searched for free
                space. Also records in the heap can be split to fill
                space on two or more pages.
            </p>
        </div>
        <div class="sect3" lang="en" xml:lang="en">
          <div class="titlepage">
            <div>
              <div>
                <h4 class="title"><a id="idm140526457397840"></a>Record Access</h4>
              </div>
            </div>
          </div>
          <p> 
                Btree and Heap are fundamentally different because
                of the way that you access records in them. In Btree,
                you access a record by using the record's key. This
                lookup occurs fairly quickly because Btree places
                records in the database according to a pre-defined
                sorting order. Your application is responsible for
                constructing the key, which means that it is
                relatively easy for your application to know what key
                is in use by any given record. 
            </p>
          <p> 
                Conversely, Heap accesses records based on their
                offset location within the database. You retrieve a
                record in a Heap database using the record's Record ID
                (RID), which is created when the record is added to
                the database. The RID is created for you; you cannot
                specify this yourself. Because the RID is created for
                you, your application does not have control over the
                key value. For this reason, retrieval operations for a
                Heap database are usually performed using secondary
                databases. You can then use this secondary index to
                retrieve records stored in your Heap database. 
            </p>
          <p>
                Note that an application's data access requirements
                grow complex, Btree databases also frequently require
                secondary databases. So at a certain level of
                complexity you will be using secondary databases
                regardless of the access method that you choose.
            </p>
          <p>
                Secondary databases are described in <a class="xref" href="am_second.html" title="Secondary indexes">Secondary indexes</a>.
            </p>
        </div>
        <div class="sect3" lang="en" xml:lang="en">
          <div class="titlepage">
            <div>
              <div>
                <h4 class="title"><a id="idm140526457993408"></a>Record Creation/Deletion</h4>
              </div>
            </div>
          </div>
          <p> 
                When Btree creates a new record, it places the
                record on the database page that is appropriate for
                the sorting order in use by the database. If Btree can
                not find a page to put the new record on, it locates a
                page that is in the proper location for the new
                record, splits it so that the existing records are
                divided between the two pages, and then adds the new
                record to the appropriate page. 
            </p>
          <p> 
                On deletion, Btree simply removes the deleted
                record from whatever page it is stored on. This leaves
                some amount of unused space ("holes") on the page.
                Only new records that sort to this page can fill that
                space. However, once a page is completely empty, it
                can be reused to hold records with a different sort
                value.
            </p>
          <p> 
                In order to reclaim unused disk space, you must run
                the <a href="../api_reference/C/dbcompact.html" class="olink">DB-&gt;compact()</a> method, which attempts to fill holes
                in existing pages by moving records from other pages.
                If it is successful in moving enough records, it might
                be left with entire pages that have no data on them.
                In this event, the unused pages might be removed from
                the database (depending on the flags that you provide
                to <a href="../api_reference/C/dbcompact.html" class="olink">DB-&gt;compact()</a>), which causes the database file to be
                reduced in size.
            </p>
          <p>
                Both tree searching and page compaction are
                relatively expensive operations. Heap avoids these
                operations, and so is able to perform better under
                some circumstances.
            </p>
          <p>
                Heap does not care about record order. When a
                record is created in a Heap database, it is placed on
                the first page that has space to store the record. No
                sorting is involved, and so the overhead from record
                sorting is removed. 
            </p>
          <p>
                On deletion, both Btree and Heap free space within
                a page when a record is deleted. However, unlike
                Btree, Heap has no compaction operation, nor does it
                have to wait for a record with the proper sort order
                to fill a hole on a page. Instead, Heap simply reuses
                empty page space whenever any record is added that
                will fit into the space.
            </p>
        </div>
        <div class="sect3" lang="en" xml:lang="en">
          <div class="titlepage">
            <div>
              <div>
                <h4 class="title"><a id="idm140526457639888"></a>Cursor Operations</h4>
              </div>
            </div>
          </div>
          <p>
                When considering Heap, be aware that this access
                method does not support the full range of cursor
                operations that Btree does.
            </p>
          <div class="itemizedlist">
            <ul type="disc">
              <li>
                <p>
                        On sequential cursor scans of the database,
                        the retrieval order of the records is not
                        predictable for Heap because the records are
                        not sorted. Btree, of course, sorts its
                        records so the retrieval order is predictable.
                    </p>
              </li>
              <li>
                <p> 
                        When using a Heap database, you cannot
                        create new records using a cursor. Also, this
                        means that Heap does not support the <a href="../api_reference/C/dbcput.html" class="olink">DBC-&gt;put()</a>
                            <code class="literal">DB_AFTER</code> and
                            <code class="literal">DB_BEFORE</code> flags. You
                        can, however, update existing records using a
                        cursor. 
                    </p>
              </li>
              <li>
                <p>
                        For concurrent applications, iterating
                        through the records in a Heap database is not
                        recommended due to performance considerations.
                        This is because there is a good chance that
                        there are a lot of empty pages in the database
                        if you have a concurrent application. 
                    </p>
                <p>
                        For a Heap database, entire regions are
                        locked when a lock is acquired for a database
                        page. If there is then contention for that
                        region, and a new database page needs to be
                        added, then Berkeley DB simply creates a whole
                        new region. The locked region is then padded
                        with empty pages in order to reach the new
                        region. 
                    </p>
                <p>
                        The result is that if the last used page in
                        a region is 10, and a new region is created at
                        page 100, then there are empty pages from 11
                        to 99. If you are iterating with a cursor,
                        then all those empty pages must be examined by
                        the cursor before it can reach the data at
                        page 100. 
                    </p>
              </li>
            </ul>
          </div>
        </div>
        <div class="sect3" lang="en" xml:lang="en">
          <div class="titlepage">
            <div>
              <div>
                <h4 class="title"><a id="idm140526458296880"></a>Which Access Method Should You Use?</h4>
              </div>
            </div>
          </div>
          <p>
                Ultimately, you can only determine which access
                method is superior for your application through
                performance testing using both access methods. To be
                effective, this performance testing must use a
                production-equivalent workload. 
            </p>
          <p>
                That said, there are a few times when you
                absolutely must use Btree:
            </p>
          <div class="itemizedlist">
            <ul type="disc">
              <li>
                <p>
                        If you want to use bulk put and get
                        operations. 
                    </p>
              </li>
              <li>
                <p> 
                        If having your database clustered on sort
                        order is important to you.
                    </p>
              </li>
              <li>
                <p>
                        If you want to be able to create records
                        using cursors.
                    </p>
              </li>
              <li>
                <p> 
                        If you have multiple threads/processes
                        simultaneously creating new records, and you
                        want to be able to efficiently iterate over
                        those records using a cursor. 
                    </p>
              </li>
            </ul>
          </div>
          <p>
                But beyond those limitations, there are some
                application characteristics that should cause you to
                suspect that Heap will work better for your
                application than Btree. They are: 
            </p>
          <div class="itemizedlist">
            <ul type="disc">
              <li>
                <p> 
                        Your application will run in an environment
                        with constrained resources and you want to set
                        a hard limit on the size of the database file.
                    </p>
              </li>
              <li>
                <p> 
                        You want to limit the disk space growth of
                        your database file, and your application
                        performs a roughly equivalent number of record
                        creations and deletions.
                    </p>
              </li>
              <li>
                <p>
                        Inserts into a Btree require sorting the
                        new record onto its proper page. This
                        operation can require multiple page reads. A
                        Heap database can simply reuse whatever empty
                        page space it can find in the cache.
                        Insert-intensive applications will typically
                        find that Heap is much more efficient than
                        Btree, especially as the size of the database
                        increases. 
                    </p>
              </li>
            </ul>
          </div>
        </div>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="idm140526458043648"></a>Hash or Btree?</h3>
            </div>
          </div>
        </div>
        <p> 
            The Hash and Btree access methods should be used when
            logical record numbers are not the primary key used for
            data access. (If logical record numbers are a secondary
            key used for data access, the Btree access method is a
            possible choice, as it supports simultaneous access by a
            key and a record number.) 
        </p>
        <p> 
            Keys in Btrees are stored in sorted order and the
            relationship between them is defined by that sort order.
            For this reason, the Btree access method should be used
            when there is any locality of reference among keys.
            Locality of reference means that accessing one particular
            key in the Btree implies that the application is more
            likely to access keys near to the key being accessed,
            where "near" is defined by the sort order. For example, if
            keys are timestamps, and it is likely that a request for
            an 8AM timestamp will be followed by a request for a 9AM
            timestamp, the Btree access method is generally the right
            choice. Or, for example, if the keys are names, and the
            application will want to review all entries with the same
            last name, the Btree access method is again a good choice.
        </p>
        <p>
            There is little difference in performance between the
            Hash and Btree access methods on small data sets, where
            all, or most of, the data set fits into the cache.
            However, when a data set is large enough that significant
            numbers of data pages no longer fit into the cache, then
            the Btree locality of reference described previously
            becomes important for performance reasons. For example,
            there is no locality of reference for the Hash access
            method, and so key "AAAAA" is as likely to be stored on
            the same database page with key "ZZZZZ" as with key
            "AAAAB". In the Btree access method, because items are
            sorted, key "AAAAA" is far more likely to be near key
            "AAAAB" than key "ZZZZZ". So, if the application exhibits
            locality of reference in its data requests, then the Btree
            page read into the cache to satisfy a request for key
            "AAAAA" is much more likely to be useful to satisfy
            subsequent requests from the application than the Hash
            page read into the cache to satisfy the same request. This
            means that for applications with locality of reference,
            the cache is generally much more effective for the Btree
            access method than the Hash access method, and the Btree
            access method will make many fewer I/O calls.
        </p>
        <p> 
            However, when a data set becomes even larger, the Hash
            access method can outperform the Btree access method. The
            reason for this is that Btrees contain more metadata pages
            than Hash databases. The data set can grow so large that
            metadata pages begin to dominate the cache for the Btree
            access method. If this happens, the Btree can be forced to
            do an I/O for each data request because the probability
            that any particular data page is already in the cache
            becomes quite small. Because the Hash access method has
            fewer metadata pages, its cache stays "hotter" longer in
            the presence of large data sets. In addition, once the
            data set is so large that both the Btree and Hash access
            methods are almost certainly doing an I/O for each random
            data request, the fact that Hash does not have to walk
            several internal pages as part of a key search becomes a
            performance advantage for the Hash access method as well. 
        </p>
        <p> 
            Application data access patterns strongly affect all of
            these behaviors, for example, accessing the data by
            walking a cursor through the database will greatly
            mitigate the large data set behavior describe above
            because each I/O into the cache will satisfy a fairly
            large number of subsequent data requests.
        </p>
        <p> 
            In the absence of information on application data and
            data access patterns, for small data sets either the Btree
            or Hash access methods will suffice. For data sets larger
            than the cache, we normally recommend using the Btree
            access method. If you have truly large data, then the Hash
            access method may be a better choice. The <a href="../api_reference/C/db_stat.html" class="olink">db_stat</a> utility is a
            useful tool for monitoring how well your cache is
            performing. 
        </p>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="idm140526459252880"></a>Queue or Recno?</h3>
            </div>
          </div>
        </div>
        <p> 
            The Queue or Recno access methods should be used when
            logical record numbers are the primary key used for data
            access. The advantage of the Queue access method is that
            it performs record level locking and for this reason
            supports significantly higher levels of concurrency than
            the Recno access method. The advantage of the Recno access
            method is that it supports a number of additional features
            beyond those supported by the Queue access method, such as
            variable-length records and support for backing flat-text
            files. 
        </p>
        <p>
            Logical record numbers can be mutable or fixed:
            mutable, where logical record numbers can change as
            records are deleted or inserted, and fixed, where record
            numbers never change regardless of the database operation.
            It is possible to store and retrieve records based on
            logical record numbers in the Btree access method.
            However, those record numbers are always mutable, and as
            records are deleted or inserted, the logical record number
            for other records in the database will change. The Queue
            access method always runs in fixed mode, and logical
            record numbers never change regardless of the database
            operation. The Recno access method can be configured to
            run in either mutable or fixed mode.
        </p>
        <p>
            In addition, the Recno access method provides support
            for databases whose permanent storage is a flat text file
            and the database is used as a fast, temporary storage area
            while the data is being read or modified.
        </p>
      </div>
    </div>
    <div class="navfooter">
      <hr />
      <table width="100%" summary="Navigation footer">
        <tr>
          <td width="40%" align="left"><a accesskey="p" href="am_conf.html">Prev</a> </td>
          <td width="20%" align="center">
            <a accesskey="u" href="am_conf.html">Up</a>
          </td>
          <td width="40%" align="right"> <a accesskey="n" href="am_conf_logrec.html">Next</a></td>
        </tr>
        <tr>
          <td width="40%" align="left" valign="top">Chapter 2.  Access Method Configuration  </td>
          <td width="20%" align="center">
            <a accesskey="h" href="index.html">Home</a>
          </td>
          <td width="40%" align="right" valign="top"> Logical record numbers</td>
        </tr>
      </table>
    </div>
  </body>
</html>
